Homework 1

hw1

desriptive statistics

probability

The first homework on descriptive statistics and probability

Author

Yakub Rabiutheen

Published

September 20, 2022

Question 1

a

First, let’s read in the data from the Excel file:

Code

library(readxl)
df <- read_excel("_data/LungCapData.xls")

The distribution of LungCap looks as follows:

Code

hist(df$LungCap,freq = FALSE)

The histogram suggests that the distribution is close to a normal distribution. Most of the observations are close to the mean. Very few observations are close to the margins (0 and 15).

b

Comparison of the Genders for both Men and Women using a Boxplot.

Code

boxplot(df$LungCap ~ df$Gender)

c

Here is the capacity of Smokers vs Non-Smokers

Code

boxplot(df$LungCap~df$Smoke,
        ylab = "Capacity", 
        main = "Lung Capacity of Smokers Vs Non-Smokers",
        las = 1)

d

Let’s break it down even further, this is the Lung Capacity by Age Group

Code

df$Agegroups<-cut(df$Age,breaks=c(-Inf, 13, 15, 17, 20), labels=c("0-13 years", "14-15 years", "16-17 years", "18+ years"))

Below is the overall Lung Capacity of Age Groups without including Smokers.

Code

library(ggplot2)
ggplot(df, aes(x = LungCap, y = Agegroups, fill = Gender)) +
          geom_bar(stat = "identity") +
          coord_flip() +
          theme_classic()

Here is a comparision of AgeGroup Lung Capacity in comparison with Smoker vs Non-Smoker.

Code

ggplot(df, aes(x = LungCap, y = Agegroups, fill = Smoke)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    theme_classic()

1f

Based on the comparison of lung capacities between Smoker and Non-Smoker the results are pretty similar.

Code

cov(df$LungCap, df$Age)

[1] 8.738289

Code

cor(df$LungCap, df$Age)

[1] 0.8196749

Question 2

Code

X <- c(0:4)
Frequency <- c(128, 434, 160, 64, 24)
df <- data.frame(X, Frequency)
df

  X Frequency
1 0       128
2 1       434
3 2       160
4 3        64
5 4        24

As shown below, the most common Prior Convictions is 1.

Code

df

  X Frequency
1 0       128
2 1       434
3 2       160
4 3        64
5 4        24

Dividing by the total among 810 we can determine the probability for each. 810 is the Sum of the Frequency which I checked manually.

Code

df2 <- mutate(df, Probability = Frequency/sum(Frequency))

Error in mutate(df, Probability = Frequency/sum(Frequency)): could not find function "mutate"

Code

df2

Error in eval(expr, envir, enclos): object 'df2' not found

Filter for Probability of 2 Convictions

Code

b2 <- df2 %>% 
  filter(X < 2)

Error in df2 %>% filter(X < 2): could not find function "%>%"

Code

sum(b2$Probability)

Error in eval(expr, envir, enclos): object 'b2' not found

Filter for Probability of Less than 2 Convictions

Code

c2 <- df2 %>% 
  filter(X <= 2)

Error in df2 %>% filter(X <= 2): could not find function "%>%"

Code

sum(c2$Probability)

Error in eval(expr, envir, enclos): object 'c2' not found

Filter for Probability of greater than 2 convictions.

Code

d2 <- df2 %>% 
  filter(X > 2)

Error in df2 %>% filter(X > 2): could not find function "%>%"

Code

sum(d2$Probability)

Error in eval(expr, envir, enclos): object 'd2' not found

What is the expected value of the number of prior convictions?

Code

e <- weighted.mean(df2$X, df2$Probability)

Error in weighted.mean(df2$X, df2$Probability): object 'df2' not found

Code

Error in eval(expr, envir, enclos): object 'e' not found

Variance and Standard Deviation for Question.

Code

var(df$X)

[1] 2.5

Code

sd(df$X)

[1] 1.581139